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Abstract: We introduce a new copula-based correction for generalized 
linear mixed models (GLMMs) within the integrated nested Laplace ap¬ 
proximation (INLA) approach for approximate Bayesian inference for la¬ 
tent Gaussian models. While INLA is usually very accurate, some (rather 
extreme) cases of GLMMs with e.g. binomial or Poisson data have been 
seen to be problematic. Inaccuracies can occur when there is a very low 
degree of smoothing or “borrowing strength” within the model, and we 
have therefore developed a correction aiming to push the boundaries of the 
applicability of INLA. Our new correction has been implemented as part of 
the R-INLA package, and adds only negligible computational cost. Empir¬ 
ical evaluations on both real and simulated data indicate that the method 
works well. 
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1. Introduction 

Integrated Nested Laplace Approximations (INLA) were introduced by Rue, 
Martino and Chopin (2009) as a tool to do approximate Bayesian inference 
in latent Gaussian models (LGMs). The class of LGMs covers a large part of 
models used today, and the INLA approach has been shown to be very accurate 
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Fig 1. Minimal example defined in Equation (1). The histograms show posterior distributions 
from a long MCMC run (ten chains of one million iterations each), the black curves show 
the posteriors from INLA, while the red curves show the posteriors using our new correction 
to INLA . 


and extremely fast in most cases. Software is provided through the R-INLA 
package, see http://www.r-inla.org. 

An important subclass of LGMs is the rich family of generalized linear mixed 
models (GLMMs) with Gaussian priors on fixed and random effects (Breslow 
and Clayton, 1993; McCulloch, Searle and Neuhaus, 2008). The use of INLA for 
Bayesian inference for GLMMs was investigated by Fong, Rue and Wakefield 
(2010), who reanalyzed all of the examples from Breslow and Clayton (1993). 
Fong, Rue and Wakefield (2010) found that INLA works very well in most 
cases, but one of their examples shows some inaccuracy for binary data with 
few or no replications. In this paper, we introduce a new correction term for 
INLA, significantly improving accuracy while adding negligibly to the overall 
computational cost. 

To set the scene, we consider a minimal simulated example illustrating the 
problem (postponing more thorough empirical evaluations until Section 3). Con¬ 
sider the following model: For i = 1,2, let Prob(j/i = 0) = 1 — Pi, 

Prob(?/j = 1) = pi, and 

logit (pi) =P + Ui, (1) 

where tq ~ V(0, er 2 ), iid. Let the precision er" 2 have a Gamma(l, 1) prior, while 
the prior for (3 is V(0,1). We simulated data from this model, setting n = 100, 
er 2 = 1 and /3 = 2. Figure 1 shows the resulting posterior distributions for 
the intercept /? and for the log precision, log(cr" 2 ), where the histograms show 
results from long MCMC runs using JAGS (Plummer, 2013), the black curves 
show posteriors from INLA without any correction, and the red curves show 
results using the new correction defined in Section 2. While some of our later 
examples show more dramatic differences between INLA and long MCMC runs, 
these results exemplify quite well our general experience with using INLA for 
“difficult” binary response GLMMs: Variances of both random and fixed effects 
tends to be underestimated, while the means of the fixed effects are reasonably 
well estimated. 
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Linear predictor 



Linear predictor 

Fig 2. Log-likelihood (top panel) and derivative of log-likelihood (bottom panel) for a single 
Bernoulli observation as a function of the linear predictor in a logistic model. 


One part of the problem is that the usual assumptions ensuring asymptotic 
validity of the Laplace approximation do not hold here (for details on asymp¬ 
totic results, see the discussion in Section 4 of Rue, Martino and Chopin (2009)). 
The independence of the random effects make the effective number of param¬ 
eters (Spiegelhalter et ah, 2002) on the order of the number of data points. In 
more complex models, there is often some amount of smoothing or replication 
that alleviates the problem, but it may still occur. Except in the case of spline 
smoothing models (Kauermann, Krivobokova and Fahrmeir, 2009), there is a 
lack of strong asymptotic results for random effects models with a large effec¬ 
tive number of parameters. In the simulation from model (1), the data provide 
little information about the parameters, with the shape of the likelihood function 
adding to the problem. Figure 2 illustrates the general problem. Here, the top 
panel shows the log-likelihood of a single Bernoulli observation Y as a function 
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of the linear predictor r /, i.e. log(Prob(Y = 1)) = log(p) where logit(p) = r). The 
bottom panel shows the corresponding derivative. We see that the log-likelihood 
gets very flat (and the derivative near zero) for high values of 77 , so inference 
will get difficult. 

Bayesian and frequentist estimation for GLMMs with binary outcomes has 
been given some attention in the recent literature (Capanu, Gonen and Begg, 
2013; Grilli, Metelli and Rampichini, 2014; Sauter and Held, 2015), but a com¬ 
putationally efficient Bayesian solution appropriate for the INLA approach has 
been lacking. An alternative to our new approach would be to consider higher- 
order Laplace approximations (Shun and McCullagh, 1995; Raudenbush, Yang 
and Yosef, 2000; Evangelou, Zhu and Smith, 2011), other modifications to the 
Laplace approximation (Ruli, Sartori and Ventura, 2015; Ogden, 2015), or ex¬ 
pectation propagation-type solutions (Cseke and Heskes, 2010), but we view 
them as too costly to be applicable for general use in INLA. The motivation 
for using INLA is speed, so we see it as a design requirement for any correction 
that it should add minimally to the running time of the algorithm. 

We proceed as follows. In Section 2, we present a derivation of our new cor¬ 
rection method. Section 3 presents empirical studies, both on real and simulated 
data, showing that the method works well in practice. Finally Section 4 gives a 
brief discussion and some concluding remarks. 


2. Methodology 


Consider a latent Gaussian model (Rue, Martino and Chopin, 2009), with hy¬ 
perparameters 0 = (6 1 ,..., Op)' , latent field x = (x ±,..., x n ) r and observed data 
y = {yi : i £ 1} (for I C {l,...,n}), where the joint distribution may be 
written as 

ir{x,0,y) = Tr(0)Tr(x\0)Y[^{yi\xi,0) 
iex 


where tt(x\0) is a multivariate Gaussian density. We want to approximate the 
posterior marginals ir(xi\y) and Tr(0j\y). The Laplace approximation of Tr(0\y) 
is 


tt(%) oc 


n(x,0,y) 

VG{x\0,y) 


x=p(6) 


( 2 ) 


where ttg{x\ 0, y) is a Gaussian approximation found by matching the mode and 
the curvature at the mode of 7 r(x| 0,y), and y(0) is the mean of the Gaussian 
approximation. 

Given Tr(0\y) and some approximation Tr(xi\0,y) (see below), the posterior 
marginals of interest are calculated using numerical integration: 

H0j\y) = J Tr(0\y)d0-j, 


*(xi\y) 


Tf(xi\0,y)Tt(0\y)d0. 


( 3 ) 
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In the current implementation of INLA, the n(xi\9 , y) used in (3) are approx¬ 
imated using skew normal densities irsN^il#, y) (Azzalini and Capitanio, 1999), 
based on a second Laplace approximation; see Section 3.2.3 of Rue, Martino 
and Chopin (2009) for details. Notice that, in Equation (2) we use a Gaussian 
approximation 7 tg(x| 9,y), with marginals TTG{xi\9,y), i = 1,... ,n. Thus, both 
^SN(xi\6,y) and ttg(x\ 9, y) are approximations to the marginals 7r(x,;|0, y), but 
the irgN (x% |$, y) are more accurate since they are based on a second Laplace ap¬ 
proximation. In (2) we need to approximate the full joint distribution tt(x\ 9,y). 
Our basic idea is to use the improved approximations ^SN{xi\9,y) in order 
to construct a better approximation to the joint distribution n(x\9,y). We 
aim for an approximation of n(x\9,y) that retains the dependence structure 
of the Gaussian approximation ttg{x\9, y), while having the improved marginals 
^SN{xi |0, y). This can be achieved by using a Gaussian copula. 

Before we describe the copula construction, we need to define some notation. 
First, for i = 1 ,...,n, let jli(9) and of((9) denote the mean and variance of 
each marginal nsN{xi\9,y), and let Fi be the cumulative distribution function 
corresponding to TrsN(xi\9, y). Second, let Xi ~ Fi and assume that Fi is the 
distribution of Zi = (xi — jdi{9)) / ai(9). As usual, $ denotes the cumulative stan¬ 
dard Gaussian distribution function. Furthermore, let Ri(9) and of (9) denote 
the marginal means and variances of the Gaussian approximation ttg(x\9, y), 
let Q(9) be the precision matrix of TTQ(x\9,y), and let x = (aq,... ,x n ) where 
x ~ Trc(x\9,y), and define Zi = (Xi — fii(9))/<Ji(9), i= 1 Note that we 

have of (9) = erf (9) from the definition of 7rsN(xi\9, y) (the construction of the 
skew normal changes the mean and adds skewness, but keeps the variance un¬ 
changed; again, see Section 3.2.3 of Rue, Martino and Chopin (2009) for detailed 
explanations), so from here on we denote both simply by of (9). 

We will now show how to construct a joint distribution having marginals 
Fi and the dependence structure from nG(x\9,y), using a Gaussian copula (see 
e.g. Nelsen (2007) for a general introduction). First, note that <&(zj) ~ U[0,1] 
by the probability integral transform (PIT). Let Zi = -F) -1 ^^)). Applying 
the inverse of the PIT then yields that Zi ~ Fj, from which it follows that 
Xi = ai(9)zi + jli{9) is distributed as Xi ~ F), which is the marginal distribution 
we want. Since we have only done marginal transformations, the dependence 
structure of the original x = (xi,... x n )' is still intact. Thus, to construct the 
new approximation to the joint distribution 7f(x|0, y), we define the transformed 
value xi as follows: 



We may simplify the construction above by replacing the Fi in (4) by 4*. 
This means that we do not correct for skewness, but we take advantage of 
the improved mean fii{9) from nsN{xi\9, y). We denote this as the “mean only” 
correction. (We will later discuss the possibility of retaining Fj as a skew normal; 
this we denote as the “mean and skewness” correction.) In the simple “mean 
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only” case, the transformation reduces to a shift in mean: 

Xi=Xi— Hi(0) + 


the Jacobian is equal to one, and the transformed joint density function is a 
multivariate normal with mean /2 and precision matrix Q , i.e. 

log7r(i| 0,y) = ^ log |Q(0)| - ^(x - Q(6)(x - p,(6)) + constant (5) 

In the Laplace approximation defined in Equation (2), both the numerator and 
the denominator should be evaluated in the point x = /x(0), where /x(0) = 
(0),..., n n {9))' is the mean of the Gaussian approximation ttg{x\ 6, y). Thus, 
the density functions above should be evaluated in x = n{9). From Equation (2), 
the original (uncorrected) log posterior is 


log7f(0|y) = log7 t(x,0,u) - log nG(x\9,y) + constant (6) 


evaluated at x = /x(0), where 


logTf G (x\e,y)\ x=Ke) 


— log |<2(0)1 + constant, 


( 7 ) 


Comparing equations (5), (6), and (7), we see that the copula approximation can 
be implemented by adding the term (7(0) to the already calculated log posterior 
evaluated at /z(0), where 

cw = \w) - mYQmm - m)- 

The addition of the term (7(0) does not add significantly to the computational 
cost of INLA - this simple operation is essentially free. 

For the INLA implementation of the copula correction, we have found that 
it is sufficient to only include fixed effects (including any random effects of 
length one) in the calculation of <7(0). The effect of the correction is strongest 
and most consistent for the fixed effects, while the (often very numerous) ran¬ 
dom effects contribute very small individual effects to the correction, mainly 
adding extraneous noise to the estimation. For these reasons, including only 
fixed effects gives better numerical stability and also seems to provide a more 
accurate approximation, while reducing computational costs. Conceptually, in¬ 
cluding only the fixed effects involves finding E(0) = <2(0) _1 , and then again 
finding Qj{6) = £ j{6) 1 (where J is the index set of the fixed effects), which 
might seem computationally costly. However, it can be done cheaply by using 
the linear combination feature described in Section 4.4 of Martins et al. (2013): If 
rtf is the number of fixed effects, only the (parallel) solution of a n/-dimensional 
linear system is needed. 

Additionally, to guard against over-correction, we perform a soft thresholding 
on (7(0), as follows: First we define a sigmoidal function f(t): 


m 


1 + exp(—2 1) 
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which is increasing, has derivative equal to one at the point £ = 0 , and where 
/(£)—» 1 as £ —>• oo. Then we replace C(9) by C t (9), where 

c t (e) = uf(C(o)/u), 

for u = nf( and with the “correction factor” parameter £ > 0 determining the 
degree of shrinkage (more shrinkage for smaller values of £). Since the function 
/(£) is approximately linear with unit slope around zero, Ct{9) will be close to 
C{9) for small and moderate values of C(9 ), while larger values will be increas¬ 
ingly shrunk toward zero. Note that since /(£) < 1 for all £, C t {9) < u. The 
value of £ does not have a large impact on the results unless a too small value 
is chosen. Its main purpose is as a safeguard to avoid too large corrections in 
very difficult cases. In our experience £ = 1 gives too strong shrinkage, while for 
example £ = 100 corresponds to no shrinkage, so it seems clear that £ should 
be somewhere in between these extremes. We have found that £ = 10 is a good 
choice, letting the correction do its job while guarding against too large changes, 
and we have used this value for all of the examples. Results appear to be very 
robust to the exact value chosen for £. Note also that since the correction effect 
u is scaled with the number n/ of fixed effects, it is less surprising that a single 
value for £ could work well in a wide variety of circumstances. 

As mentioned, we have also investigated a more general case of the copula 
construction, where we retain Fi as a skew normal distribution, i.e. the CDF 
of FsN{xi\9,y). This results in a more complicated correction term C s ] iew (9) : 
derived in Appendix A. We have not found any appreciable differences in the 
accuracy compared to the simpler case without skewness, so we have concluded 
that the non-skew version is preferable due to its simplicity. We will show both 
the skew and the non-skew correction for the toenail data discussed in Sec¬ 
tion 3.2, but otherwise we show only results from the simpler non-skew version. 
We have tried both corrections on many (both real and simulated) data sets, 
and never seen a significant difference in the results. 


3. Empirical results 

3.1. Have we solved the problems detected by Fong et al. (2010)? 

As mentioned in the Introduction, Fong, Rue and Wakefield (2010) studied the 
use of INLA for binary valued GLMMs, and they showed that the approxima¬ 
tions were inaccurate in some cases. We have redone the simulation experiment 
described on pages 10-14 of the Supplementary Material of Fong, Rue and 
Wakefield (2010), both for INLA without any correction, and INLA with the 
“mean only” correction described in Section 2. 

In the original simulation study by Fong, Rue and Wakefield (2010), Yjj are 
iid Binomial)™,p,; ; ), with i = 1,..., 100 clusters, j = 1,... ,7 observations per 
cluster, and m € {1,2,4, 8 }. Given Xi = 0 for i < 50 and Xi = 1 otherwise, 
and sampling times t = (t\,. .., £ 7 )' = (—3, —2, —1, 0,1, 2, 3)', the following two 
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Table 1 

Results from simulation study from model (8) 



Averages of posterior 

means: 





CT 0 

^0 

log °o 

Po 

Pi 

P2 

Pa 

Tme values 

1.000 

1.000 

0.000 

-2.500 

1.000 

- 1.000 

-0.500 

Uncorrected INLA 

0.705 

0.722 

1.133 

-2.494 

0.998 

-1.052 

-0.486 

Corrected INLA 

0.952 

0.850 

0.775 

-2.562 

1.024 

-1.080 

-0.504 

MCMC 

0.946 

0.849 

0.773 

-2.537 

1.017 

-1.081 

-0.482 

Comparison between INLA 

and MCMC, (E(INLA)-E(MCMC))/sd(MCMC): 



cro 

log °o 

Po 

Pi 

p2 

P3 

Uncorrected INLA 

-0.382 

-0.403 

0.390 

0.120 

-0.127 

0.052 

-0.016 

Corrected INLA 

-0.003 

0.000 

0.002 

-0.073 

0.046 

-0.002 

-0.101 

Ratio of variances 

Var(INLA) /Var(MCMC): 




a o 

^0 

log cr 0 2 

Po 

Pi 

p2 

Ps 

Uncorrected INLA 

0.585 

0.812 

1.174 

0.822 

0.834 

0.882 

0.889 

Corrected INLA 

0.933 

0.956 

0.998 

0.904 

0.871 

0.943 

0.908 

Average coverage of 95% intervals from INLA over MCMC samples: 


a o 

&0 

log cr 0 2 

Po 

Pi 

p2 

Ps 

Uncorrected INLA 

90.3% 

90.0% 

90.8% 

92.6% 

92.7% 

93.5 % 

93.5% 

Corrected INLA 

94.2% 

93.9% 

94.4% 

93.5% 

93.1% 

94.3% 

93.7% 


models were considered: 


logit pij = /3 0 + pitj + /3 2 Xi + /3 3 tjXi + b 0 i (8) 

logit Pij = P 0 + Pitj + P 2 Xi + /3 3 tjXi + b 0i + butj, (9) 

which corresponds to models (0.7) and (0.8) on page 11 of the Supplementary 
Material of Fong, Rue and Wakefield (2010). 

We first consider model (8). We only show the results for m = 1 (i.e. binary 
data), as this is the most difficult case with the largest errors in the approx¬ 
imation. The correction also works well for m > 1, but this case is easier to 
deal with for INLA. This is seen empirically, and is also as expected based on 
considering the asymptotic properties of the Laplace approximation: for m > 1 
there is more “borrowing of strength” /replication, so the original approxima¬ 
tion should work better. We use the same settings as Fong, Rue and Wake¬ 
field (2010): boi ~ud A^(0,ctq) where a q = 1, the prior Gamma(0.5,0.0164) 

for ctq 2 and iV(0,1000) priors for the /3j. The true values of the fixed effects 

are /? = (/3 0 , /3i, /? 2 , /? 3 ) / = (—2.5,1.0,—1.0,—0.5)'. We made 1,000 simulated 
data sets, running INLA both with and without the new correction, as well as 
very long MCMC chains using JAGS (each of the 1,000 datasets were run with 
1,000,000 MCMC samples after a burn-in of 100,000, using every 100th sample). 

Results from the simulation study are shown in Table 1. Note that the aim 
here is to be as close as possible to the MCMC results, not the true values. The 
upper part of the table shows averages of the posterior means over the 1,000 
simulations. We see that INLA gets much closer to the MCMC results for all 
parameters except (3 3l which is in any case reasonably close to the MCMC value. 
The improvement is particularly large for the variance parameter. This is also 
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Table 2 

Summary statistics for computation times in seconds for each data set 



Min. 

1st Qu. 

Median 

Mean 

3rd Qu. 

Max. 

Uncorrected INLA 

1.70 

2.25 

2.39 

2.43 

2.56 

3.41 

Corrected INLA 

1.77 

2.58 

2.69 

2.74 

2.86 

3.89 


seen in the second panel, which shows (E(0 INLA ji/) —E(0 MCMC |y))/sd(0 MCMC |2/) 
for each parameter 9 (averaged over the 1,000 simulations), i.e. the difference in 
INLA and MCMC estimates scaled by the MCMC standard deviation. Here, the 
random effects variance and the fixed effects except f3z are also more accurately 
estimated. The third lower panel shows the ratios Var(0 INLA |y)/Var(0 MCMC |y); 
here the correction improves the estimation of the variance for all parameters. 
For CTq, (To and log a^ 2 we get very close to a ratio of one, and there are also 
major improvements for the fixed effects variances. Finally, the bottom panel 
shows average coverage of 95% (i.e., (< 70 . 025 , < 70 . 975 )) credible intervals from INLA 
over the MCMC samples for each simulated data set. Clearly, coverage is im¬ 
proved considerably by the correction. Table 2 reports summary statistics for the 
computation times in seconds over the 1,000 data sets. Note that in this case 
computational times are abnormally high due to somewhat extreme parame¬ 
ter settings - INLA will usually be much faster. However, ratios of computing 
times for the corrected vs the uncorrected versions should stay approximately 
the same. 

Appendix B contains additional simulation studies: Results from simulations 
for model (9) for different values of the covariance matrix of (6oi, bu)' are shown 
in Appendix B.l. Furthermore, in Appendix B.2 we consider the effect of hav¬ 
ing extremely few observations per cluster, while we in Appendix B.3 study 
a misspecified model, simulating from model (9) while estimating model (8). 
The correction appears to work well for all the different cases considered in 
Appendix B. 

3.2. What happens when the random, effects variance increases? 

We start by discussing the toenail data, which is a classical data set with a 
binary response and repeated measures (De Backer et ah, 1998; Lesaffre and 
Spiessens, 2001). The data are from a clinical trial comparing two competing 
treatments for toenail infection (dermatophyte onychomycosis). The 294 pa¬ 
tients were randomized to receive either itraconazole or terbinafine, and the 
outcome (either “not severe” infection, coded as 0, or “severe” infection, coded 
as 1) was recorded at seven follow-up visits. Not all patients attended all the 
follow-up visits, and the patients did not always appear at the scheduled time. 
The exact time of the visits (in months since baseline) was recorded. For indi¬ 
vidual i, visit j, with outcome y^, treatment Trti and time Tirne,^ our model 
is then 

yij ~ Bernoulli (p^) 

logit Pij = a 0 + cerrtTrtj + aTimeTime^ + arTTrLTime^ + bi 

b z ~ N{0,a 2 ). 
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Fig 3. The top panel shows the posterior density of the hyperparameter (log precision) for 
the toenail data. The histogram is from one million MCMC samples from JAGS, the blue 
curve is from uncorrected INLA, the red curve from INLA with the “mean only” correction, 
and the green curve from INLA with the “mean and skewness” correction. The bottom panel 
shows the additive corrections to the log posterior for the toenail data for the “mean only” 
and the “mean and skewness” correction. 


Notice that this is the same model as model (8), except that the time variable 
here varies over individuals. Normal priors with mean zero and variance 10 4 
were used for ao, ckTrt, c^Time; and citt - INLA underestimates cr 2 quite severely. 
The top panel of Figure 3 shows the different estimates of the posterior dis¬ 
tribution of the log precision, logcr~ 2 . The histogram shows the results from a 
long MCMC run using JAGS, the black curve shows the posterior from INLA 
without the correction, the red curve shows the simple (non-skew) version of the 
INLA correction, while the green curve shows the INLA correction accounting 
for skewness (as discussed in Appendix A). The bottom panel of Figure 3 shows 
the additive correction to the log posterior density, as a function of the hyper¬ 
parameter (log precision). We see that there is very little difference between the 
two corrections. 
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a = 0.1 o=1 



c = 3.5 o = 4 



o = 1.5 



o = 3 



Fig 4. Posteriors densities of log precision from the toenail simulation experiment for dif¬ 
ferent values of the random effects standard deviation a (the value of o is shown above each 
histogram). The histograms are from long MCMC runs, uncorrected INLA are shown as black 
curves, while the red curves shows INLA with the correction. Since the goal here is to study 
the difference between MCMC and INLA, we omit axes - the relevant scale is given by the 
MCMC variances, which are evident from the histograms. 


For the toenail data, the estimated random effects standard deviation is ap¬ 
proximately (j = 4, which is very high. To investigate how the copula correction 
works as rr increases, we studied simulated data sets from the model above, 
where we set cr to different values, and where the a parameters were fixed to 
the values from a long MCMC run using the real toenail data (i.e., we simulate 
only the outcome, keeping the covariates unchanged). Results are shown in Fig¬ 
ure 4 for different values of a ranging from cr = 1 to a = 16. We clearly get very 
accurate corrected posteriors for a < 4. For a > 4, we gradually get a tendency 
of under-correction. 

3.3. Simulated example with Poisson likelihood 

We shall now study the case where the data are Poisson distributed. We con¬ 
sider a simple simulated (extreme) example in order to investigate how well 
the correction works in the Poisson case. For i = l,...,n we generated iid 
Di ~ Poisson(p,i) where 


log (pi) = /3 + Ui, 
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(3 = 1 (3 = 0.75 (3 = 0.5 



Fig 5. Posteriors densities of log precision from the Poisson simulation experiment for differ¬ 
ent values of the intercept (3 (the value of (3 is shown above each histogram). The histograms 
are from long MCMC runs, uncorrected INLA posteriors are shown as black curves, while 
the red curves shows INLA posteriors with the correction. 


with m ~ N(0,cr 2 ). We chose n = 300 and a 2 = 1, a Gamma(l,l) prior 
for the precision ct _ 1 , and a iV(0,l) prior for /?. Figure 5 shows the results 
for different values of the intercept j3. Each histogram is based on ten parallcll 
MCMC runs using JAGS, each with 200,000 iterations after a burn-in of 100,000. 
Here, reduction of /? implies that estimation is more difficult, since negative /3 
with large absolute value will imply that the counts yi are very low, with many 
zeroes, and the data are uninformative. We see that uncorrected INLA tends to 
get less accurate as /3 moves towards more extreme values, while the correction 
seems to work well. 

3.4- What if the latent structures are more complicated? 

Until now, we have considered fairly simple latent structures, where the ran¬ 
dom effects have been iid (multivariate) normally distributed. The reader may 
perhaps wonder if the generality implied by having “latent Gaussian models” 
in the title is really justified - what if latent structures are more complicated, 
with for example temporal or spatial structure? 
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In fact, the complexity of the latent field is not particularly relevant for 
the accuracy problems we study here. This can be seen by considering the basic 
formulation of the latent Gaussian model together with the main building blocks 
of the INLA machinery: Essentially, the latent structure is contained within 
the Gaussian part, for which the computations are exact (and fast, since the 
precision matrix of the Gaussian part will usually be sparse). In a sense, the 
LGM approach separates the estimation in an “easy” (Gaussian) part and a 
“difficult” part. It is perhaps somewhat counterintuitive at first sight that the 
dynamic/time-series/spatial model constitutes the “easy” part! In this paper we 
have in fact considered the “difficult” part, aiming to choose examples at the 
boundary of what we considered to be feasible. Thus, we argue that our general 
title is indeed justified. 

We illustrate this with a simple simulated example where the latent structure 
is auto-regressive of order one (AR1), using a similar setup as in the “minimal” 
example presented in the Introduction. For i = 1,2,... ,n, let Prob(yj = 0) = 
1 - pi, Prob (yt = 1) = Pi, and 


logit (pi) = P + Ui, 


( 10 ) 


where the ?/,; are now given an AR1 model, as follows: u\ ~ N(Q, n~ l ), = 

pui -1 + Si (where £* ~ 7V(0, r -1 )) for * = 2,3,..., n, where k = r( 1 — p 2 ) is the 
marginal precision. Define 


6i = log(«) 


and 



which is the parameterization used internally in INLA. We use a Gamma(l, 1) 
prior for k, and A r (0,l) priors for both /? and $ 2 - Data was simulated from 
model (10), using p = 0.5, r = 1 and (3 = 2 as the true values. As in the 
example in the Introduction, we made long MCMC runs and compared the 
results to INLA both with and without the correction. The results are shown in 
Figure 6. Again, it is clear that the overall accuracy of INLA is improved using 
the correction. 

4. Discussion 

The binary (and, more generally, binomial) GLMMs discussed in Sections 3.1 
and 3.2 are are important in many applications, particularly for biomedical 
data. Poisson GLMMs are also of great interest, and among the difficult cases 
here are point processes such as log-Gaussian Cox processes (Illian, Sprbye and 
Rue, 2012; Simpson et ah, 2013), where data are typically extremely sparse: 
essentially there are ones at the observed points, and zeroes everywhere else. 
Our example in Section 3.3 shows a stylized, extreme case of this type. Studying 
the correction for the full log-Gaussian Cox process case could be a topic for 
future work. Even though the point process case may be difficult, there will often 
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intercept 




Fig 6. AR1 example defined in Equation (10). The top panel displays results for the intercept 
parameter (3, the middle panel shows results for 6 \ = log{n), and the bottom panel show results 
for 62 = log the “internal p” of INLA. The histograms show posterior distributions 

from a long MCMC run (ten chains of one million iterations each), the black curves show 
the posterior from INLA, while the red curve shows the posteriors using our new correction 
to INLA. 
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be some degree of smoothing and/or replication making inference easier, so real 
data sets should be less extreme than the simulated example in Section 3.3. 

From the results in this paper, it appears that the copula correction is robust 
and works well. There is no general theory guaranteeing that the method will 
always work under all circumstances, but we feel that the intuition underlying 
the method is quite strong. Since INLA for LGMs is quite accurate in most 
cases, the correction is not needed in general, only for problematic cases such 
as those discussed in this paper. Using the copula correction method, we can 
stretch the limits of applicability of INLA, while maintaining its computational 
speed. 


Appendix A: Copula correction accounting for skewness 


Let Fi denote the “standardized” skew normal CDF corresponding to 
FSN{xi\0,y). We start by finding the Jacobian of the transformation defined 
in Equation (4). Note first that, immediately from (4) 


Fi 


fxi- 

\ J 


= $ 


( Xi-m{6) \ 

\ )■ 


Letting fi and <j> denote the density functions corresponding to Fi and $, dif¬ 
ferentiating with respect to Xi then gives 


Xj - jii{6) \ 1 _ , / Xi - dxj 1 

<Ji{9) J cn(9) V ai{9) ) dxi <Ji(9) 


so 



and the Jacobian of the transformation is ff 1 since = 0 for i ^ j and 
f|j > 0 for all i. 

Note that (a; - /j,(9))'Q(9){x - /z(0)) = Yh=i Z)"= i Qij{9)(xi - fu{9))(xj - 
N {9)) where Q(9) = ( Qij{9 )). Collecting the different terms and again substi¬ 
tuting 


Xj- Hi{9) _i U / Xi-fi(9) V 

[ 1 v am J\ ’ 


the transformed joint log density f{x\ 9,y) is therefore 


log7r(i|0,y) = - log |Q(0)| 


^oEE^b^KO 9 )^ 1 


i= 1 J'=1 


Xi - jli(9) 
<?i{9) 



~ / Zj-iijW )V 


L J v J\ 


+ E l0g & 

i— 1 


(Xi- fii{9)\ 

\ <9) ) 


i =1 ' 


Fi 


fxi- fii(9) V 

l <9) )_ 


+ constant 
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The original (uncorrected) log posterior is 

log7r((%) = log7r(x, 9 , y) — \ognF(x\0, y) + constant, 

evaluated at x = p{0) where log7Ti?(a;|0,J/)| x=Al (e> = |log|<2(0)| + constant. 
Therefore, the version of the copula correction accounting for skewness amounts 
to adding a term C s y ev ,{9) to the original log joint posterior, where 




*=i i=i 

n 

+E lo s^ 


Fi 


Pi{9) - fii{9) 

&i{0) 


C skew (0) = 
Vi( 9 ) ~ frijO) 

n , 

X]l°g^ ( $_1 


$" 


Vj{0) ~ fa (0) 


m 


Fi 


fii(0) - p,j(Q) 
<Ji{0) 


Calculations of Fi and /, : were done using the functions psn and dsn in the 
R package sn (Azzalini, 2015). 


Appendix B: Additional simulation results 
B.l. Model with two dependent random effects 

Here we study model (0.8) from page 11 of the Supplementary Material of Fong, 
Rue and Wakefield (2010), where the observations Y i: j are iid Binomial(pjj, m), 
with i = 1,..., 100 clusters, j = 1,...,7 observations per cluster, Xi = 0 
for i < 50 and Xi = 1 otherwise, and sampling times t = (ti,...,^)' = 
(—3, —2, —1, 0,1, 2, 3)'. The model is 


logit Pij — /?o + f3\tj + @ 2 Xi + pffjXi + boi + butj , (11) 

where the (&oi, bu) 1 are iid bivariate normally distributed with mean (0, 0). Fol¬ 
lowing Fong, Rue and Wakefield (2010), the prior for the precision matrix of 
{boi,buY is a Wishart distribution with three degrees of freedom and diagonal 
scale parameter with diagonal elements 0.17 and 0.025. The fixed effects are 
given jV(0,1000) priors. 

As in Fong, Rue and Wakefield (2010), we shall consider the case when boi 
and bu are uncorrelated, but we shall here also consider the correlated case 
with correlation p = 0.5 and p = 0.9, respectively. Additionally, we consider two 
different settings of the marginal variances of &oi and bn : 

1. Var(6 0 i) = 0.5, Var(6 li ) = 0.25 (as in Fong, Rue and Wakefield (2010)), 

2. Var(& 0i ) = 3.0, Var(&i,) = 0.5. 

For each of the two settings of the marginal variances above, we ran the 
simulation experiment for the three settings of p (0, 0.5 and 0.9 correlation), 
giving six simulation settings in total. For each simulation setting, we made 200 
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simulated data sets, and ran two MCMC chains of 200,000 iterations each (after 
discarding the first 100,000 iterations) for each simulated data set. 

We have yet to specify the number of trials m in the binomial distribution. 
It turns out that this model is nearly unidentifiable for m = 1, with very slow 
MCMC convergence and with numerical instability when running INLA (both 
with and without the correction). Therefore, we will here consider m > 1, and 
show the results for m = 2. Results (not shown) are similar also for m > 2. As 
expected, the estimation becomes more accurate as m grows, and for large m 
(say, m > 10) there is no need for the INLA correction anymore. 

Results are shown in Tables 3-8 below, where we use the parameterization 
(9i,9 2 ,9 3 ) used internally by INLA, where 9\ = log^^ 2 ), 9 2 = log(crf 2 ) and 

9 3 = log (note that 0\, 9 2 and 9 3 are defined on the whole real line). It 

seems like the correction is working quite well, giving an overall improvement. 
The coverage probabilities are improved in all cases expect for the log 
with p < 0.5, so we see an improvement for 38 of the 6*7 = 42 combinations of 
parameters and simulation settings. The variance ratio Var(INLA)/Var(MCMC) 
is also improved for nearly all the cases, while the other performance measures 
show an overall (though not uniform) improvement. The method does not seem 
to deteriorate for higher values of the marginal variances and correlation. 


Table 3 

Results from simulation study (11) with Var(&oi) = 0.5, Var(feii) = 0.25 and p = 0 



Averages of posterior 

means: 





i° g (0 

lo g(°T 2 ) 


) A> 

Pi 

02 

03 

True values 

0.693 

1.386 

0.000 

-2.500 

1.000 

- 1.000 

-0.500 

Uncorrected INLA 

1.527 

2.130 

1.566 

-2.664 

1.021 

-0.692 

-0.428 

Corrected INLA 

1.380 

2.016 

1.313 

-2.703 

1.038 

-0.704 

-0.429 

MCMC 

1.449 

2.022 

1.707 

-2.694 

1.032 

-0.707 

-0.421 

Comparison between INLA 

and MCMC, (E(INLA)-E(MCMC))/sd(MCMC): 


iogfo'o" 2 ) 

i°g(°T 2 ) 


) A) 

Pi 

02 

03 

Uncorrected INLA 

0.113 

0.167 

-0.124 

0.125 

-0.086 

0.042 

-0.038 

Corrected INLA 

-0.086 

-0.028 

-0.329 

-0.028 

0.045 

0.007 

-0.047 

Ratio of variances, Var(INLA)/Var(MCMC): 




i°g( cr cT 2 ) 

logfcrf 2 ) 


) A) 

Pi 

02 

03 

Uncorrected INLA 

0.882 

0.986 

0.927 

0.905 

0.907 

0.937 

0.948 

Corrected INLA 

0.943 

0.974 

0.996 

0.994 

0.982 

0.977 

0.987 

Average coverage of 95% intervals from INLA over 

MCMC samples: 


iogf 0 ^ 2 ) 

i°g(°T 2 ) 


) A) 

pi 

02 

03 

Uncorrected INLA 

92.4% 

93.3% 

92.8% 

93.7% 

93.7% 

94.2% 

94.3% 

Corrected INLA 

92.9% 

93.7% 

90.0% 

93.7% 

93.8% 

94.6% 

94.7% 
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Table 4 

Results from simulation study (11) with Var(boi) = 0.5, Var(bn) = 0.25 and p = 0.5 



Averages of posterior 

means: 





iogOiT 2 ) 

lo g(°T 2 ) 


) ft 

/Si 

ft 

Pi 

True values 

0.693 

1.386 

1.099 

-2.500 

1.000 

-1.000 

-0.500 

Uncorrected INLA 

1.444 

1.866 

2.051 

-2.623 

1.139 

-0.721 

-0.590 

Corrected INLA 

1.366 

1.786 

1.900 

-2.642 

1.148 

-0.730 

-0.594 

MCMC 

1.363 

1.790 

2.176 

-2.649 

1.148 

-0.731 

-0.582 

Comparison between INLA 

and MCMC, (E(INLA)-E(MCMC))/sd(MCMC): 


•og^o" 2 ) 

i°g(°T 2 ) 


) ft 

/Si 

ft 

Pi 

Uncorrected INLA 

0.126 

0.134 

-0.118 

0.111 

-0.073 

0.027 

-0.041 

Corrected INLA 

0.013 

-0.031 

-0.252 

0.035 

-0.005 

0.002 

-0.066 

Ratio of variances, Var(INLA)/ Var(MCMC): 




i°gK -2 ) 

l°g(ft~ 2 ) 

1o s(t^ 

) ft 

/Si 

ft 

Pi 

Uncorrected INLA 

0.891 

0.998 

0.930 

0.904 

0.912 

0.934 

0.953 

Corrected INLA 

0.948 

1.001 

1.043 

0.942 

0.953 

0.952 

0.979 

Average coverage of 95% intervals from INLA over 

MCMC samples: 


log(^o' 2 ) 

l°g(°4 2 ) 

lo s(i^ 

) /So 

/Si 

ft 

Pi 

Uncorrected INLA 

92.8% 

94.1% 

93.1% 

93.8% 

93.9% 

94.2% 

94.4% 

Corrected INLA 

93.7% 

94.8% 

92.8% 

93.9% 

94.1% 

94.4% 

94.6% 


Table 5 

Results from simulation study (11) with Var(boi) = 0.5, Var(bn) = 0.25 and p = 0.9 



Averages of posterior 

means: 





log(^0 _2 ) 

l°g(oT 2 ) 


) ft 

Pi 

ft 

^3 

True values 

0.693 

1.386 

2.944 

-2.500 

1.000 

-1.000 

-0.500 

Uncorrected INLA 

0.700 

1.530 

3.133 

-2.543 

1.027 

-0.947 

-0.460 

Corrected INLA 

0.662 

1.471 

3.062 

-2.553 

1.031 

-0.954 

-0.465 

MCMC 

0.605 

1.466 

3.224 

-2.569 

1.037 

-0.959 

-0.448 

Comparison between INLA 

and MCMC, (E(INLA)-E(MCMC))/sd(MCMC): 


lo g(ftT 2 ) 

lo g(°T 2 ) 


) ft 

Pi 

ft 

Pi 

Uncorrected INLA 

0.195 

0.130 

-0.107 

0.110 

-0.080 

0.034 

-0.059 

Corrected INLA 

0.113 

-0.007 

-0.183 

0.066 

-0.049 

0.013 

-0.087 

Ratio of variances, Var(INLA)/ Var(MCMC): 




lo g(ftT 2 ) 

logftf 2 ) 

lo s(i^ 

) ft 

Pi 

ft 

Pi 

Uncorrected INLA 

0.958 

1.023 

0.943 

0.905 

0.921 

0.920 

0.950 

Corrected INLA 

0.989 

1.050 

1.087 

0.925 

0.951 

0.932 

0.973 

Average coverage of 95% intervals from INLA over 

MCMC samples: 


iogOo" 2 ) 

l°g(ft" 2 ) 


) ft 

Pi 

ft 

Pi 

Uncorrected INLA 

93.4% 

94.7% 

93.7% 

93.9% 

94.1% 

94.0% 

94.3% 

Corrected INLA 

94.4% 

95.4% 

94.6% 

94.1% 

94.4% 

94.2% 

94.5% 
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Table 6 

Results from simulation study (11) with Varfboi) = 3, Var(bu) = 0.5 and p = 0 



Averages of posterior 

means: 





io g(°-(7 2 ) 

i°g(°T 2 ) 

lo g(i^ 

) A> 

Pi 

P 2 

As 

True values 

-1.099 

0.693 

0.000 

-2.500 

1.000 

-1.000 

-0.500 

Uncorrected INLA 

-0.749 

0.962 

-0.243 

-2.274 

0.956 

-0.661 

-0.584 

Corrected INLA 

-0.809 

0.850 

-0.294 

-2.318 

0.978 

-0.672 

-0.592 

MCMC 

-0.844 

0.867 

-0.250 

-2.306 

0.971 

-0.652 

-0.593 

Comparison between INLA 

and MCMC, (E(INLA)-E(MCMC))/sd(MCMC): 


l°g(0 

i°g(°T 2 ) 


) A) 

Pi 

P2 

As 

Uncorrected INLA 

0.335 

0.317 

0.017 

0.101 

-0.106 

-0.021 

0.049 

Corrected INLA 

0.126 

-0.058 

-0.106 

-0.039 

0.047 

-0.050 

0.003 

Ratio of variances, Var(INLA)/Var(MCMC): 




iog^o" 2 ) 

i°g(°T 2 ) 

lo g(U? 

) A) 

Pi 

P 2 

As 

Uncorrected INLA 

0.953 

0.986 

0.984 

0.902 

0.908 

0.886 

0.903 

Corrected INLA 

0.951 

0.969 

0.950 

0.950 

0.978 

0.933 

0.976 

Average coverage of 95% intervals from INLA over 

MCMC samples: 


l°g(0 

i°g(°T 2 ) 

lo g(l^ 

) A) 

Pi 

P2 

Pa 

Uncorrected INLA 

92.9% 

93.2% 

94.8% 

93.9% 

93.9% 

93.6% 

93.8% 

Corrected INLA 

94.2% 

94.7% 

94.3% 

94.3% 

94.6% 

94.2% 

94.7% 



Table 7 





Results from simulation study (11) with Var(bQi) 

= 3, Var(bn) = 0.5 

and p 

= 0.5 


Averages of posterior 

means: 





i°g(0 

lo g(°T 2 ) 

!°g(££ 

) A> 

Pi 

P2 

As 

True values 

-1.099 

0.693 

1.099 

-2.500 

1.000 

-1.000 

-0.500 

Uncorrected INLA 

-0.714 

1.289 

1.778 

-2.151 

1.039 

-1.013 

-0.559 

Corrected INLA 

-0.816 

1.106 

1.348 

-2.229 

1.080 

-1.041 

-0.570 

MCMC 

-0.816 

1.172 

1.643 

-2.166 

1.048 

-1.003 

-0.557 

Comparison between INLA 

and MCMC, (E(INLA)-E(MCMC))/sd(MCMC): 


iog^ 2 ) 

i°g(°T 2 ) 

lo g(l^ 

) A) 

Pi 

P 2 

As 

Uncorrected INLA 

0.336 

0.273 

0.141 

0.045 

-0.061 

-0.022 

- 0.010 

Corrected INLA 

0.002 

-0.170 

-0.306 

-0.203 

0.226 

-0.091 

-0.068 

Ratio of variances, Var(INLA)/Var(MCMC): 




1 °g( cr cT 2 ) 

iogK“ 2 ) 

lo g(l^ 

) A) 

Pi 

P 2 

As 

Uncorrected INLA 

0.954 

1.030 

0.977 

0.892 

0.905 

0.901 

0.934 

Corrected INLA 

0.934 

0.964 

0.761 

0.984 

1.031 

0.976 

1.041 

Average coverage of 95% intervals from INLA over 

MCMC samples: 


iog^o" 2 ) 

lo g(°T 2 ) 

lo g(U? 

) A) 

Pi 

P 2 

Pa 

Uncorrected INLA 

92.8% 

93.5% 

93.7% 

93.5% 

93.6% 

93.7% 

94.2% 

Corrected INLA 

93.3% 

94.2% 

90.5% 

93.7% 

94.2% 

94.6% 

95.3% 
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Table 8 

Results from simulation study (11) with Var(boi) = 3, Var(bn) = 0.5 and p = 0.9 



Averages of posterior 

means: 





lo gO(j" 2 ) 

logK" 2 ) 


) do 

di 

d2 

d3 

True values 

-1.099 

0.693 

2.944 

-2.500 

1.000 

-1.000 

-0.500 

Uncorrected INLA 

-1.087 

1.025 

4.483 

-2.672 

1.015 

-0.876 

-0.590 

Corrected INLA 

-1.160 

0.940 

4.466 

-2.699 

1.018 

-0.902 

-0.606 

MCMC 

-1.169 

0.941 

4.537 

-2.703 

1.039 

-0.845 

-0.572 

Comparison between INLA 

and MCMC, (E(INLA)-E(MCMC))/sd(MCMC): 


•ogfo'o" 2 ) 



) do 

di 

d2 

da 

Uncorrected INLA 

0.288 

0.192 

-0.063 

0.083 

-0.150 

-0.061 

-0.079 

Corrected INLA 

0.032 

-0.008 

-0.075 

0.011 

-0.130 

-0.111 

-0.149 

Ratio of variances, Var(INLA)/ Var(MCMC): 




lo gOo" 2 ) 

logFU 2 ) 

1 o s(t^ 

) do 

di 

d2 

da 

Uncorrected INLA 

0.938 

0.961 

0.883 

0.883 

0.897 

0.887 

0.932 

Corrected INLA 

0.978 

1.019 

1.060 

0.933 

0.943 

0.937 

0.986 

Average coverage of 95% intervals from INLA over 

MCMC samples: 


lo g( CT o" 2 ) 

i°g(°T 2 ) 

lo s(i^ 

) do 

di 

d2 

da 

Uncorrected INLA 

92.1% 

92.6% 

91.4% 

93.5% 

93.5% 

93.5% 

93.9% 

Corrected INLA 

93.5% 

93.9% 

93.3% 

94.1% 

94.1% 

94.0% 

94.3% 


B.2. Model with very small number of observations per cluster 

In the simulation study described in Section 3.1 we followed Fong, Rue and 
Wakefield (2010) and used m = 7 observations per cluster. As suggested by 
a reviewer, we here consider the effect of having an even smaller value of ni. 
We only show the results for the most extreme possible case, which is m = 2. 
Using the close to non-informative prior settings of model (8) in Section 3.1 
{N(0, 1000) priors for the /3j and a Gamma(0.5,0.0164) prior for er~ 2 ), the case 
with ni = 7 is already quite difficult. Using the settings described in Section 3.1, 
the simulated data are relatively low-informative, making stable and reliable 
inference non-trivial. 

In order to study the even more extreme case of m =2, more informative pri¬ 
ors are needed, otherwise both MCMC and INLA will fail. For non-informative 
(or very weakly informative) priors the model is just too close to being singular. 
Therefore, to study the case of m = 2, we use the following priors: IV(0,1) for 
the fa, and also a N(0, 1) prior for the log precision, log(cr -2 ). We used sampling 
times t = (t \, tf)' = (—3,3)', and 200 simulated data sets. One million MCMC 
samples (after a burn-in of 100,000) were used for each data set. 

The results are shown in Table 9. The correction seems to work well, giving 
improved estimates in nearly all cases. Note in particular that the 95% coverage 
is uniformly improved, and that all the coverage values are between 93.7% and 
96.2% when using the correction. 
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Table 9 

Results from simulation study with ni = 2 observations per cluster 



Averages of posterior 

means: 





a 0 

<70 

log °- 0 

0o 

01 

02 

03 

True values 

1.000 

1.000 

0.000 

-2.500 

1.000 

-1.000 

-0.500 

Uncorrected INLA 

0.667 

0.762 

0.689 

-1.990 

0.837 

-1.178 

-0.423 

Corrected INLA 

1.104 

0.933 

0.358 

-2.032 

0.860 

-1.219 

-0.442 

MCMC 

0.956 

0.899 

0.392 

-2.114 

0.891 

-1.230 

-0.427 

Comparison between INLA and MCMC, (E(INLA)-E(MCMC))/sd(MCMC): 


a o 

<70 

log o- 0 

/3o 

01 

02 

03 

Uncorrected INLA 

-0.337 

-0.362 

0.355 

0.250 

-0.295 

0.079 

0.020 

Corrected INLA 

0.149 

0.085 

-0.041 

0.163 

-0.168 

0.013 

-0.059 

Ratio of variances, 

Var(INLA)/Var(MCMC): 




a o 

<70 

log cr 0 2 

00 

01 

02 

03 

Uncorrected INLA 

0.393 

0.592 

0.841 

0.876 

0.823 

0.902 

0.873 

Corrected INLA 

2.030 

1.360 

1.127 

0.920 

0.896 

0.933 

0.907 

Average coverage of 95% intervals from INLA over MCMC samples: 


a o 

<70 

log o- 0 

00 

01 

02 

03 

Uncorrected INLA 

89.2% 

89.2% 

89.5% 

93.5% 

92.6% 

93.7% 

93.3% 

Corrected INLA 

96.2% 

95.9% 

96.0% 

94.2% 

93.8% 

94.1% 

93.7% 


B.3. Simulations with misspecified model 

As suggested by a reviewer, we here study the effect of the case of estimation 
from a misspecified model: We simulate data from the model (9) and estimate us¬ 
ing model (8) (with prior settings as in Section 3.1). As before, we use extremely 
long MCMC chains (one million iterations after discarding 100,000 iterations) 
as the “gold standard”. We simulated 200 data sets from each of the six con¬ 
figurations described in Appendix B.l. The results are shown in Tables 10-15. 
Again, the correction improves the results in nearly all cases, so it does not seem 
like using a misspecified model presents any particular problems for the INLA 
correction. 
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Table 10 

Results from simulation study with incorrectly specified model; configuration with 



Varipoi) 

= 0.5, 

Var(b\i) = 0.25 and p 

= 0 




Averages of posterior 

means: 





a l 

0 "O 

log °o 2 

0o 

01 

02 

03 

Uncorrected INLA 

0.424 

0.536 

1.778 

-2.482 

1.068 

-0.833 

-0.561 

Corrected INLA 

0.591 

0.640 

1.440 

-2.541 

1.093 

-0.847 

-0.579 

MCMC 

0.623 

0.660 

1.374 

-2.533 

1.091 

-0.855 

-0.562 

Comparison between INLA 

and MCMC, (E(INLA)-E(MCMC))/sd(MCMC): 



0"O 

log °o 

00 

01 

02 

03 

Uncorrected INLA 

-0.378 

-0.394 

0.361 

0.150 

-0.147 

0.042 

0.007 

Corrected INLA 

-0.069 

-0.067 

0.058 

-0.019 

0.011 

0.012 

-0.079 

Ratio of variances 

Var(INLA) /Var(MCMC): 




a o 

0-0 

log cr 0 2 

0o 

01 

02 

03 

Uncorrected INLA 

0.492 

0.714 

1.044 

0.834 

0.845 

0.900 

0.898 

Corrected INLA 

0.888 

0.923 

1.002 

0.905 

0.881 

0.948 

0.916 

Average coverage of 95% intervals from INLA over MCMC samples: 


a o 

0-0 

log tr 0 2 

00 

01 

02 

03 

Uncorrected INLA 

87.6% 

87.3% 

88.4% 

92.8% 

92.9% 

93.7% 

93.6% 

Corrected INLA 

93.9% 

93.4% 

93.9% 

93.6% 

93.3% 

94.4% 

93.8% 




Table 11 





Results from simulation study with incorrectly specified model; configuration 

with 


Var(boi) 

= 0.5, Var(b\i) = 0.25 and p = 

= 0.5 




Averages of posterior 

means: 





a 0 

0 'O 

log o- 0 

00 

01 

02 

03 

Uncorrected INLA 

0.983 

0.890 

0.585 

-2.667 

0.973 

-1.060 

-0.220 

Corrected INLA 

1.255 

1.018 

0.282 

-2.737 

0.996 

-1.088 

-0.230 

MCMC 

1.362 

1.066 

0.172 

-2.746 

1.004 

- 1.111 

-0.214 

Comparison between INLA 

and MCMC, (E(INLA)-E(MCMC))/sd(MCMC): 


a l 

0 'O 

log cr 0 2 

00 

01 

02 

03 

Uncorrected INLA 

-0.498 

-0.541 

0.544 

0.202 

-0.202 

0.084 

-0.024 

Corrected INLA 

-0.147 

-0.150 

0.140 

0.017 

-0.050 

0.038 

-0.069 

Ratio of variances 

Var(INLA) /Var(MCMC): 





0-0 

log tr 0 2 

00 

01 

02 

03 

Uncorrected INLA 

0.592 

0.854 

1.318 

0.780 

0.812 

0.850 

0.869 

Corrected INLA 

0.851 

0.943 

1.059 

0.851 

0.842 

0.905 

0.888 

Average coverage of 95% intervals from INLA over MCMC samples: 


a o 

0-0 

log <r 0 2 

00 

01 

02 

03 

Uncorrected INLA 

89.3% 

89.1% 

89.6% 

91.8% 

92.2% 

93.0% 

93.3% 

Corrected INLA 

94.0% 

93.8% 

94.1% 

93.1% 

92.9% 

93.8% 

93.6% 
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Table 12 

Results from simulation study with incorrectly specified model; configuration with 
Var(boi) = 0.5, Var(bu) = 0.25 and p = 0.9 



Averages of posterior 

means: 





a l 

O’o 

log o- 0 2 

a 

Pi 

P 2 

Ps 

Uncorrected INLA 

1.071 

0.937 

0.451 

-2.621 

1.097 

-1.243 

-0.478 

Corrected INLA 

1.447 

1.104 

0.073 

-2.713 

1.133 

-1.287 

-0.501 

MCMC 

1.471 

1.118 

0.037 

-2.698 

1.132 

-1.295 

-0.478 

Comparison between INLA 

and MCMC, (E(INLA)-E(MCMC))/sd(MCMC): 



<70 

log 

Po 

Pi 

P 2 

P3 

Uncorrected INLA 

-0.496 

-0.538 

0.542 

0.192 

-0.208 

0.081 

0.004 

Corrected INLA 

-0.039 

-0.042 

0.041 

-0.046 

0.015 

0.008 

-0.099 

Ratio of variances 

Var(INLA)/Var(MCMC): 




a o 

<70 

log cr 0 2 

Po 

Pi 

P 2 

P3 

Uncorrected INLA 

0.596 

0.854 

1.308 

0.775 

0.783 

0.842 

0.855 

Corrected INLA 

0.959 

0.989 

1.027 

0.871 

0.830 

0.918 

0.881 

Average coverage of 95% intervals from INLA over MCMC samples: 


a o 

O’o 

log <r 0 2 

A) 

Pi 

P 2 

Ps 

Uncorrected INLA 

89.5% 

89.4% 

89.8% 

91.7% 

91.6% 

92.9% 

93.0% 

Corrected INLA 

95.0% 

94.8% 

95.0% 

93.2% 

92.6% 

94.0% 

93.4% 




Table 13 





Results from simulation study with incorrectly specified model; configuration 

with 


Var(b oi ) = 3, 

Var(b\i) = 0.5 and p = 

0 




Averages of posterior 

means: 





CT 0 

O'O 

log O-Q 

Po 

Pi 

P2 

P3 

Uncorrected INLA 

2.644 

1.587 

-0.873 

-2.461 

0.960 

-0.626 

-0.486 

Corrected INLA 

2.938 

1.671 

-0.976 

-2.508 

0.977 

-0.637 

-0.497 

MCMC 

3.161 

1.736 

-1.053 

-2.546 

0.998 

-0.623 

-0.505 

Comparison between INLA 

and MCMC, (E(INLA)-E(MCMC))/sd(MCMC): 


a l 

O’o 

log A) 

Po 

Pi 

P 2 

P3 

Uncorrected INLA 

-0.471 

-0.500 

0.523 

0.215 

-0.271 

-0.005 

0.103 

Corrected INLA 

-0.211 

-0.219 

0.225 

0.096 

-0.153 

-0.026 

0.043 

Ratio of variances 

Var(INLA)/Var(MCMC): 





<70 

log <t 0 

Po 

Pi 

P 2 

P3 

Uncorrected INLA 

0.683 

0.828 

1.021 

0.788 

0.786 

0.826 

0.838 

Corrected INLA 

0.848 

0.921 

1.018 

0.843 

0.811 

0.880 

0.858 

Average coverage of 95% intervals from INLA over MCMC samples: 


a o 

<70 

log o- 0 2 

Po 

Pi 

P2 

P3 

Uncorrected INLA 

91.1% 

91.1% 

91.4% 

91.8% 

91.5% 

92.6% 

92.6% 

Corrected INLA 

94.2% 

94.2% 

94.4% 

93.0% 

92.4% 

93.4% 

93.1% 
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Table 14 

Results from simulation study with incorrectly specified model; configuration with 
Var(boi) = 3 , Varfbu) = 0.5 and p — 0.5 



Averages of posterior 

means: 





a l 


log o- 0 2 

Po 

Pi 

P2 

03 

Uncorrected INLA 

3.246 

1.749 

- 1.056 

- 3.025 

0.936 

- 0.642 

- 0.182 

Corrected INLA 

3.821 

1.888 

- 1.203 

- 3.117 

0.957 

- 0.676 

- 0.186 

MCMC 

3.990 

1.938 

- 1.261 

- 3.144 

0.978 

- 0.663 

- 0.183 

Comparison between INLA 

and MCMC, (E(INLA)-E(MCMC))/sd(MCMC): 



cro 

log <r 0 

Po 

Pi 

02 

03 

Uncorrected INLA 

- 0.506 

- 0.541 

0.570 

0.251 

- 0.277 

0.032 

0.009 

Corrected INLA 

- 0.144 

- 0.154 

0.161 

0.063 

- 0.139 

- 0.017 

- 0.012 

Ratio of variances 

Var(INLA) /Var(MCMC): 




a o 

0-0 

log cr 0 2 

Pa 

Pi 

02 

03 

Uncorrected INLA 

0.643 

0.802 

1.020 

0.761 

0.793 

0.806 

0.830 

Corrected INLA 

0.922 

0.973 

1.050 

0.846 

0.826 

0.885 

0.860 

Average coverage of 95% intervals from INLA over MCMC samples: 


a o 

0-0 

log tr 0 2 

00 

Pi 

p2 

03 

Uncorrected INLA 

90.4% 

90 . 4 % 

90 . 7 % 

91 . 2 % 

91 . 6 % 

92 . 3 % 

92 . 6 % 

Corrected INLA 

94 . 8 % 

94 . 8 % 

95 . 0 % 

93 . 0 % 

92 . 7 % 

93 . 5 % 

93 . 1 % 




Table 15 





Results from simulation study with incorrectly specified model; configuration 

with 


Var(boi) = 3 , Var(bu) = 0.5 and p = 

0.9 




Averages of posterior 

means: 





CT 0 

0 'O 

log <r 0 

Po 

Pi 

02 

03 

Uncorrected INLA 

3.622 

1.854 

- 1.180 

- 3.170 

1.160 

- 0.564 

- 0.213 

Corrected INLA 

4.118 

1.973 

- 1.302 

- 3.253 

1.185 

- 0.582 

- 0.220 

MCMC 

4.468 

2.058 

- 1.389 

- 3.312 

1.218 

- 0.576 

- 0.222 

Comparison between INLA 

and MCMC, (E(INLA)-E(MCMC))/sd(MCMC): 



0-0 

logo-Q 2 

Po 

Pi 

02 

03 

Uncorrected INLA 

- 0.526 

- 0.564 

0.595 

0.276 

- 0.326 

0.017 

0.037 

Corrected INLA 

- 0.229 

- 0.240 

0.248 

0.117 

- 0.186 

- 0.008 

0.007 

Ratio of variances 

Var(INLA) /Var(MCMC): 





0-0 

log tr 0 2 

0o 

Pi 

02 

03 

Uncorrected INLA 

0.641 

0.802 

1.021 

0.751 

0.755 

0.797 

0.804 

Corrected INLA 

0.844 

0.925 

1.033 

0.817 

0.788 

0.860 

0.832 

Average coverage of 95% intervals from INLA over MCMC samples: 


a o 

0-0 

log <r 0 2 

00 

Pi 

02 

03 

Uncorrected INLA 

90.1% 

90 . 1 % 

90 . 4 % 

90 . 9 % 

90 . 6 % 

92 . 1 % 

92 . 1 % 

Corrected INLA 

94.3% 

94 . 2 % 

94 . 5 % 

92 . 6 % 

91 . 9 % 

93 . 1 % 

92 . 7 % 
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